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ABSTRACT 

The National Household Education Survey (NHES) was 
conducted for the first time in 1991 as a way to collect data on the 
early childhood education experiences of young children and 
participation in adult education* Because the NHES methodology is 
relatively new, field tests were necessary* A large field test of 
approximately 15,000 households was conducted during the fall of 1989 
to examine several methodological issues* This report describes the 
approach used to increase the number of Black and Hispanic American 
households and youth in the sample. During the field test, an 
approach that uses demographic information at the telephone exchange 
level to develop sampling strata was used to oversample Black and 
Hispanic American households. The yield of the field test sample 
design was compared to that which would have been expected without 
oversawpling , and the effects of oversampling on the precision of 
survey estimates are reported. Oversampling did improve the precision 
of estimates of characteristics of Blacks and Hispanic Americans. 
Precision losses for overall totals and for non-Black and 
non-Hispanic American estimates are the price paid in order to 
improve the reliability of the estimates for Blacks and Hispanic 
Americans. Two tables and two figures report study data, and two 
appendixes contain screening ratios used to locate households of 
various groups and expected and observed sample sizes. (SLD) 
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Foreword 

The National Household Education Survey 
(NHES) represents a major new initiative of the 
National Center for Education Statistics (NCES). 
Between February and May of 1991, the NHES was 
fielded for the Hrst time as a mechanism for collecting 
data on two different sectors of education policy 
interest: th'*, early childhood education experience of 
young children and participation in adult education. 
Because the NHES methodology is relatively new and 
relies on some innovative approaches, a field test of 
the methodology w?s an essential first step in the 
development of the survey. Many of the methods of 
evaluated durmg the 1989 NHES field test were 
adopted for the full-scale survey. 

A large field test of approximately 15,000 
households was conducted during the fall of 1989. A 
number of methodological issues associated with 
collecting and analyzing data on education issues from 
a random digit dialing telephone survey were 
examined. This report is one of five that describe the 
1989 NHES Field Test experience. The five reports 
are the first in a series of technical publications 
pertaining to the design and conduct of the NHES 
that NCES hopes to continue in the years to come. 
NCES believes that the reports contained in this scries 
will provide users of the NHES data with a better 
understanding of the NHES methodology and that 
they will assist the survey design efforts of others. 

The first report in this scries, Overview of (he 
National Household Education Survey Field Test, 
describes the design of the field test and the outcomes 
of the field test data collection activities. It reports on 
the response rates obtained, both unit and item, and 
the burden associated with survey participation. Each 
of the next four reports in the series focuses on a 
specific issue that was examined in the 1989 NHES 
field test. 



The second report, Telephone Undercoverage 
Bias of 14- to Zl-Year-Olds and 3- to S-Year-Olds, 
analyzes data from the Current Population Survey to 
identify the extent of telephone coverage for two 
distinct populations of interest and the bias associated 
with this type of undercoverage for estimates of school 
dropouts and early childhood education program 
participation. Methods for adjusting survey estimates 
to partially reduce this bias are developed and 
evaluated. 

The third report. Multiplicity Sampling for 
Dropouts in the NHES Field Testy examines a 
technique that was used to increase the coverage of 
14- to 21-year-olds and to capture more dropouts in 
the sample. The report describes the effectiveness of 
the multiplicity sample in achieving these goals. 

The fourth report. Proxy Reporting of Dropout 
Status in the NHES Field Test, focuses on 
measurement errors arising from the use of proxy 
respondents. During the 1989 Field Test, a 
knowledgeable hod^ehold member was used as a 
source of information on the school enrollment of 
each sampled 14- to 21-year-old in the household. In 
addition, 14- to 21-year-olds were asked to report on 
their own school enrollment. The report describes the 
correspondence between the responses given by proxy 
respondents with those provided by the youths 
themselves. 

The fifth report, Effectiveness of Oversampling 
Blacks and Hispanics in the NHES Field Test, 
describes the approach used to increase the number 
of black and Hispanic households/youth in the 
sample. During the field test, an approach that uses 
demographic information at the telephone exchange 
level to develop sampling strata was used to 
oversample black and Hispanic households. The 
report examines the yield of the field test sample 
design versus that which would have been expected 
without oversampling. The effects of oversampling on 
the precision of survey estimates are reported. 
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Introduction 

During the fall of 1989, the Field Test of the 
National Household Education Survey (NHES) was 
conducted by the National Center for Education 
Statistics (NCES) to explore the feasibility of 
collecting education data by telephone from a sample 
of persons in their households. The NHES is the 
first major attempt by NCES to go beyond its 
traditional surveys, which rely upon school-based data 
collection systems and are typically conducted by mail 
or in>person data collection methods. 

A household survey has the potential to provide Ihe 
types of data needed to study current issues in 
education, particularly those which can not be 
adequately addressed through a school-based survey. 
Such issues include dropping out of school, adult and 
continuing education, preschool education, the status 
of former teachers, and home-based education. 
Consequently, the NHES methodology may greatly 
enhance the scope of issues covered by the data 
collection activities of NCES. 

Since the NHES data collection methods were 
untested for education surveys, the Field Test was 
developed to evaluate the use of this approach. Two 
topics of broad policy interest were included in the 
Field Test: the early childhood education 
characteristics of 3- to 5-year-olds, and the 
educational status of 14- to 21-year-olds with a 
special focus on youth who dropped out of school 
before completing high school. By including both of 
these study areas in the Field Test, the ability to use 
the NHES to study multiple, complex topics, 
employing different sampling requirements and 
respondent rules could be evaluated. 

Westat, Inc., under contract with NCES, conducted 
all of the Field Test interviews using computer- 
assisted telephone interviewing (CATI) methods. 
The use of CATI methods made sampling 
respondents for interviews easy and nearly invisible to 
the telephone respondent, an important benefit when 
several persons may be sampled in a household. 
CATI also directed the interviewers through complex 
skip patterns and provided the opportunity to 
incorporate edit checks to help resolve inconsistencies 
in the data while the respondents were still on the 
telephone. Another major advantage of CATI was 
that data analysis could begin soon after data 
collection ended, because data entry and many of the 
edit checks were done during the interview. 



The sampling scheme used in the Field Test was a 
variant of the Mitofsky-Waksberg random digit dial 
(RDD) procedure^ in which every residential 
telephone number has the same chance of being 
drawn into the sample. Because of the need for 
more precise estimates of blacks and Hispanics, 
special sampling methods were used to increase the 
sample size for these persons. The design for the 
Field Test was essentially the same as planned for a 
full-scale NHES study, except the overall sample size 
was smaller. 

The sample resulted in collecting data from 15,037 
households representing all civilian, noninstitu- 
tionalizcd persons in the 50 states and the District of 
Columbia. Although only persons living in telephone 
households could be sampled for the Field Test, 
adjustments were made in the weights so that the 
estimates of persons living in both telephone and 
nontelephone households could be produced. 

Respondents in sampled households were asked a 
series of screening questions. This interview, called 
the Screener, was used to enumerate all the members 
of the household, determine the eligibility of each 
person in the household for the early childhood 
education (3- to 5-year-olds) and youth (14- to 21- 
year-olds) studies, and obtain some data on the 
characteristics of the household. A total of 4,374 
households had at least one person enumerated in 
the Screener who was eligible for an extended 
interview. The response rate to the Screener was 79 
percent. 

The early childhood education interview was 
conducted with the parent or guardian who knew the 
most about each sampled 3- to 5-year-old child's care 
and education. Accordingly, this interview was called 
the Parent Interview. Of the 1,551 children identified 
in the Screener, parents completed interviews for 
1,530 children, a completion rate of 99 percent. 

If the household contained any 14- to 21-year-olds, 
then a Household Respondent Interview (HRI) was 
attempted for each of these members. The HRI was 
used to determine the current and previous 
educational status of the youth; this interview could 
be completed by any adult household member who 
knew about the educational activities of the youth, 
including self-reports by the youth. Of the 4,441 
youths identified in the Screener, HRIs were 
completed for 4,313 youths, for a 97 percent 
completion rate. As part of a special methodological 
study of multiplicity sampling, mothers in a 
subsample of the households were asked to complete 
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the HRI for their 14- to 21 -year-old children who did 
not live in their household. These youth are included 
in the numbers stated above. 

A Youth Interview (YI) was then attempted for a 
subsample of the 14- to 21 -year-olds in the 
household. All the youths who were not currently 
enrolled in school and did not have a high school 
diploma or equivalent (as reported in the HRI), and 
a sample of all other youths, were targeted for the 
YI. The interview contained more detailed items on 
the educational experiences of the youth that could 
only be answered by the youth. Of the 1,863 youths 
sampled, 1,604 completed the YI, a completion rate 
of 86 percent. These numbers include a sample of 
133 youth who did not live in the sampled 
households, but were included through the multipli- 
city sample when their mothers completed the HRI. 

This report describes the impact of the test of 
sampling procedures used to increase the sample size 
for blacks and Hispanics in the Field Test, one of 
several methodological studies undertaken in the Field 
Test. The Field Test is described in greater detail in 
another report entitled 0\ferview Report on (he 1989 
National Household Education Survey Field Test, the 
first in a series of reports on the Field Test. The 
Overview Report describes the sample design, the 
data collection methods and instruments, the response 
rates, and other salient aspects of the collection and 
analysis process for the Field Test. 

Procedures for oversampling to improve the reliability 
of estimates for blacks and Hispanics in a random 
digit dial telephone survey are relatively innovative 
and untested. If oversampling proved to be 
successful in the Field Test, the same technique could 
be used to improve the estimates for blacks anc' 
Hispanics in future full-scale surveys. Oversampling 
was accomplished by doubling the sampling rates in 
geographic areas (telephone exchanges) which had 
large proportions of either black or Hispanic 
residents. 

The next section includes a brief review of the 
oversampling design used for the NHES Field Test. 
Section 3 provides an evaluation of the oversampling 
design by examining the sample yields and comparing 
the yield with the expected sample sizes if no 
oversampling had been carried out. Furthermore, 
this section includes a study of precision levels of the 
estimates coming from the NHES sample compared 
to the estimates expected from an equal probability 



sample of the same total size. The last section 
summarizes the findings of the study and suggests 
procedures for future studies. 

Description of the Oversampling 
Design 

The methods available for oversampling in a random 
digit dial telephone survey are limited. The method 
used in the Field Test was developed and described in 
detail by Mohadjer.^ The procedure utilizes a data 
tape produced by the Donnelley Marketing 
Information Services. Donnelley sells computer tapes 
containing 1980 census characteristics for telephone 
exchanges which can be used to stratify the exchanges 
by the density of the black and Hispanic population. 
The exchanges in different strata can then be sampled 
at different rates to oversample those with high 
concentrations of blacks and Hispanics. 

This approach differs from the typical RDD sample 
which does not employ oversampling. In the typical 
design, a list of all telephone area codes and existing 
I>refix numbers is obtained from AT&T and a frame 
of all the possible first 8-digit telephone numbers is 
established. A simple random sample is then selected 
from this frame. A random 2-digit number is added 
to the sampled 8-digit number to create a first-stage 
sample of telephone numbers. 

The first-stage numbers are ihen dialed and each 
number is identified as residential or nonresidential. 
The numbers which are residential are retained as the 
first stage sampled units, while the nonresidential 
numbers are discarded. The second stage of sampl- 
ing consists of taking each of the first 8 digits of the 
retained sampled telephone numbers and adding a set 
of random 2-digit numbers to form the desired sample 
of 10-digit telephone numbers. Since the second 
stage sample is generated using the first 8 digits of 
the retained number, these first 8 digits are called 
clusters. 

To specify the oversampling procedures, preliminary 
research was conducted to determine 1) how to 
stratify telephone exchanges and 2) the best rate for 
oversampling the strata. A summary of this research 
is given in appendix B. Telephone exchanges with 20 
percent or more blacks or 20 percent or more 
Hispanics were identified and oversampled at a rate 
of 2 to 1. Clusters with 20 percent or more blacks or 
20 percent or more Hispanics will be referred to as 



high minority clusters, and the remainder of clusters 
will be refeiTed to as low minority clusters. 

For the NHES Field Test, a sample of 1,000 
residential clusters was required, with the high 
minority clusters having twice the chance of selection 
as the low minority clusters. The first step was to 
stratify the clusters. A random sample of 10,000 
clusters from the AT&T tape was selected and 
matched against the Donnelley tape to create a file 
with information on percent of the population which 
was blacks and Hispanics for each cluster. The 
sample of 10,000 clusters was sampled to control the 
cost of the matching process. 

The following distribution was observed for the initial 
sample of 10,000 exchanges. 

Not found on Donnelley 2,044 

Classified as low minority 5,928 

Classified as high minority 2.028 

Total 10,000 

Exchanges with 20 percent or more blacks or 20 
percent or more Hispanics were designated as high 
minority clusters and the other clusters were 
designated as low minority clusters. The clusters not 
found on the Donnelley tape were denoted as low 
minority clusters because there was no evidence that 
oversampling in these clusters would yield a higher 
sample of blacks or Hispanics. 

The Donnelley tape used in this study was created in 
1985 and did not include newly created exchanges. 
As Mohadjer noted in her evaluation of the tapes for 
oversampling, it is likely that the effectiveness of the 
tapes will decay with time. 

The subsampling rate for the high minority clusters 
was 0.166 while for the low minority clusters the 
subsampling rate was 0.083. A systematic sample us- 
ing the appropriate sampling rate produced a sample 
of 337 clusters from the 2,028 high minority clusters. 
Similarly, a systematic sample of 663 clusters was 
selected from the 7,972 low minority clusters. 



Evaluation of the Oversampling Design 

Although oversampling increases the sample yield 
(decreasing the amount of screening required to locate 
rare groups), it also results in sampling rates and 
weights for the portion of the population coming from 



the oversampled clusters that are different from those 
in the remainder of clusters. This difference will 
create an increase in sampling variances that will 
partially reduce the effectiveness of the larger black 
and Hispanic samples. 



Comparison of Yields in the High and Low 
Minority Clusters 

A total of 15,037 households were screened in the 
NHES Field Test, of which 11,096 w^re fully 
enumerated. The vast majority of the households 
which were not enumerated were screened and found 
to have no members in the target age groups. If all 
household members were over 65 years old, the 
household was not enumerated. During refusal 
conversion, only those households with members in 
the targeted age groups were enumerated. In 
addition, a few enumerated households with no 
race/ethnicity data were excluded from this analysis. 
For the remainder of the enumerated households, the 
race/ethnicity of the first person listed in the 
enumeration (who was the Screener respondent) was 
used as the race/ethnicity of the household. There 
were six households with missing values for the 
race/ethnicity of the first enumerated person. In 
these households, the race/ethnicity of the first person 
in the enumeration list who had a non-missing value 
for race/ethnicity was used for classification. 

Of the 11,096 fully enumerated households, 7,541 
households were enumerated within the low minority 
clusters, and 3,555 households M'ere enumerated 
w»thinthe high minority clusters. Figure 1 shows the 
average number of households that had to be screened 
to locate a household with a specific type of member. 
These averages are called screening ratios. The 
figure shows that the screening ratio for Hispanic 
households was about five times higher in the low 
minority clusters than in the high minority clusters, 
and was about seven times higher in the low minority 
clusters than in the high minority clusters for black 
households. 

Another measure of the effectiveness of the 
oversampling is the ratio of the observed sample size 
with the oversampling scheme to the expected sample 
size if the same size sample had been selected 
without oversampling. The expected numbers in the 
sample were computed by reallocating the sample to 
the low and high minority clusters so that all clusters 
had the same probabilities of selection. This was 
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Figure 1. — Screening ratios required to locate household 
of various racial/ethnic groups 
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accomplished by dividing ihc screening sample size in 
high minority clusters by two lo give an expected 
distribution by lace/ethnicity for that stratum using 
the rate applied in the low minority cluster stratum. 
The screening sample sizes for both strata were then 
multiplied by a constant so that the total screening 
sample was equal to the observed 11,096 households. 

Figure 2 shows the ratios of the observed to the 
expected sample sizes for black and Hispanic 
households. The figure indicates that the 
oversampling resulted in increasing the number of 
Hispanic households by nearly 30 percent and the 
number of black households by 35 percent. In terms 
of increasing the sample size of households for the 
groups, the oversampling procedure was effective. 

The ratios shown in the two figures and the numbers 
used in the computation of these ratios are given in 
the detailed tables in appendix A. The detailed 
tables also give these statistics for the number of 
persons in groups, such as event and status dropouts^ 
and 3-to 5-year-olds. In general, the screening ratios 



and the ratios of expected to observed sample sizes 
for the groups other than the blacks and Hispanics 
were not very large. The only large ratios were the 
ratios of the observed to expected sample sizes for 
event and status dropouts. The sample sizes for the 
dropouts increased 9 percent and 5 percent, 
respectively. These gains were realized because the 
proportion of black and Hispanics who are dropouts 
is somewhat greater than the national average. 

Statistical Efficiency of Oversampling 

Although oversampling increased the size of the 
sample for blacks and Hispanics, it also resulted in 
different sampling rates. The differential sampling 
rates increased the sampling errors of certain 
statistics computed for the NHES Field Test data. In 
this section we will compute estimates of the 
increases in sampling errors arising from the 
oversampling of high minority clusters. 



Figure 2. — Expected and observed sample sizes for black 
and Hispanic households 




■1 Expected WM Observed 
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Source: 1989 National Household Education Survey Field Test 



The mean design effect (which is one plus the 
relative increase in variance) arising from the 
different sampling rates^ can be approximated by the 
following: 

= proportion of the 
total Hispanic (or 
black) population 
in oversampled 
clusters; 

P2 = 1 - Pi, and 

k = ratio of sampling 
rates in the 
oversampled 
exchanges to 
sampling rates in 
the remainder of 
exchanges 



Another way of representing this relationship is 

* = „_LJ — 
' i 

where 

W|j = sampling weight for the 
jth individual in the ith 
cluster, 

i = 1 for high minority 

clusters, 2 for low 
minority clusters, and 

n = sample size. 

This statistic (<{>) estimates the component of the 
design effect due to differential sampling rates used 
in the NHES sample design. Differences in the 
sampling weights are partiaUy due to oversampling 
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the high minority clusters. They are also attributable 
to the use of the modified Waksberg procedure, 
multiplicity sampling, differential nonresponse, and 
bias adjustments among different parts of the sample. 
However, the contributions of the factors associated 
with all but the oversampling are virtually eJiminated 
when the ratio of <J> for the full sample to 4> for the 
sample excluding the high minority clusters is 
computed. These ratios, which reflect the design 
effect due to the oversampling by clusters, are the 
key components used in the analysis. 

Table 1 shows some statistics related to the efficiency 
of oversampling for the status dropout sample in the 
NHES Field Test. These statistics are provided for 
estimates of the total number of status dropouts and 
for the characteristics (e.g., age, family income, and 
sex) of status dropouts. The statistics are the 
estimated design effects due to differential sampling 



rates, the ratios of observed over expected (self- 
weighting) sample sizes, and the ratios of observed 
over expected variances. The sample sizes for the 
estimates of the number of status dropouts are equal 
to the total number of youths m each of the specified 
categories. The sample sizes reported for the 
estimates of characteristics of status dropouts are 
equal to the total number of status dropouts within 
each race/ethnicity category. These are the sample 
sizes that would be used to estimate any 
characteristic of status dropouts, for example gender. 

The estimated design effects in table 1 are the ratios 
of 4) for the overall NHES sample to <{> when the 
effect of oversampling the high minority clusters is 
excluded. These reflect the appropriate increase in 
variance due to oversampling clusters. It can be seen 
that the design effects are all relatively low when 
compared to the increase in sample sizes resulting 



Table 1.— Statistical efficiency of oversampling for estimates of status dropouts 



Status dropouts 


Samplp size 


Relative design 
effect due to 
oversampling 
clusters 


Ratio of sample 

size to self- 
weighting sample 


Ratio' of 
S'd to S^c 


Number 










Total 


4,288 


1.05 


1.00 


1.05 


Race/Ethnicity 










Non-Black, Non- 


3,311 


1.05 


0.92 


1.14 


Hispanic 


538 


1.13 


1.47 


0.77 


Black, Non-Hispanic 


439 


1.13 


1.34 


0.84 


Hispanic 










Characteristics 












316 


1.08 


1.05 


1.01 


Total 










Race/Ethnicity 


226 


1.08 


0.94 


1.13 


Non-black, Non- 










Hispanic 


37 


1.23 


1.52 


0.77 


Black, Non-Hispanic 










Hispanic 


53 


1.25 


1.41 


0.83 



S Q is the variance for a similar design where no oversampling is carried out for the minority groups and S is the variance for the design 
used in the field test. The ratij is the relative design effect due to oversampling clusters divided by the ratio of the sample sizes. 



Source: 1989 National Household Education Survey Field Test. 
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from the oversampling of high minority clusters. The 
sample sizes for Hispanics and black non-Hispanics 
were increased by about 40 percent for the total 
sample and by about 50 percent for the status 
dropouts. 

The last column in table 1 shows the overall impact 
of the oversampling. It is the ratio of the variance 
for the Field Test design with oversampling (S^) to 
the expected variance for a sample with no 
oversampling (Sc^). A value of the ratio that is less 
than one indicates that oversampling improves the 
precision of the estimate. For estimates of both the 
number and characteristics of status dropouts the 
ratios are smaller than one for the black and 
Hispanic domains. For estimates of the 
characteristics of status dropouts, the estimates for 
totals are about one. 

Another way of viewing these results is by 
considering the size of the equivalent simple random 
sample that is needed to achieve these precision 
levels. For example, a simple random sample 14 
percent smaller (the effective sample size is 3,688) 
could be expected to produce estimates for the non- 
black and non-Hispanic population of the same 
reliability as the Field Test. On the other hand, the 
effective sample for estimates of the Hispanic 
population is 5,274, nearly 1,000 more than the actual 
sample size. 

Table 2 shows similar statistics for the sample of 3- 
to 5-year-olds who were in any type of care or 
educational arrangement at the time of interview. 
The saui^ le sizes reported for estimates of the 
number of children in any type of care are equal to 
the total samples of 3- to 5-year-olds for each race/ 
ethnicity category. The sample sizes for 
characteristics of children in care/preschool/after 
school care and those in care with an educational 
component are equal to the corresponding sample 
sizes for each of the race/ethnicity categories. The 
increases in the design effects due to oversampling 
were low (less than 15 percent) compared to the 
increase in the size of the samples for the minority 
groups (around 40 percent). Again, the ratios of the 
observed over expected variances are less than one 
for the minority groups, close to one for the total 
sample, and higher than one for the non-black non- 
Hispanic group. These ratios indicate the 
effectiveness of the oversampling in improving the 



precision for estimates of minorities, while estimates 
of non-black and non-Hispanics are somewhat less 
precise. 

Summary 

The screening ratios for blacks and Hispanics in the 
high minority clusters are substantially lower ( 5 to 7 
times lower) than those in the low minority clusters 
for these groups. This finding suggests a 
disproportionate concentration of blacks and 
Hispanics in the high minority clusters, which is one 
condition needed to make oversampling effective. As 
a result, the sample sizes for the minorities groups 
were greater than would have been achieved without 
oversampling. The sample of 14- to 21-year-old 
blacks was increased by nearly 50 percent, and the 
sample of 14- to 21-year-old Hispanics was increased 
by about one-third. For 3- to 5-year-olds, the 
increases in the sample sizes for blacks and Hispanics 
were about 40 to 45 percent. 

Oversampling resulted in improving the precision of 
estimates of characteristics of blacks and Hispanics. 
The variances of these statistics were about 20 to 30 
percent less than would have been found if 
oversampling had not been used. There were losses 
in precision of 5 percent to 15 percent for the non- 
black, non-Hispanic estimates and even smaller 
precision losses for estimates of totals. These 
precision losses for overall totals and for non-black 
and non-Hispanic estimates is the price paid in order 
to improve the reliability of the estimates for black 
and Hispanic estimates. 

The use of this oversampling procedure for future 
studies is recommended when the goal is to increase 
the precision of estimates for blacks and Hispanics 
and the design is similar to that used in the Field 
Test. The results from the Field Test showed that 
the method was effective in increasing the sample 
sizes for blacks and Hispanics, and did not result in 
large increases in variances for the non-black and 
non-Hispanic groups. 

Oversampling issues still need to be addressed in 
future studies, especially the amount of oversampling 
required to achieve the analytic objectives of the 
survey. Improvements in the reliability of the 
estimates for blacks and Hispanics may be important 
enough to the survey objectives to incur even greater 
losses in efficiency in estimates of totals. Revising 
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Tat?le 2.-^tatistical efficiency of oversampling for estimates of children in some types of care 



3- to 5-year-olds in some 
types of arrangements 


Samnlc 
size 


Relative 
design effect 
due tci over- 

sampling 

rliicf f»rc 


Ratio of sample 
size to self- 
weighting 
sample 


Ratio' of 


Number in any care/preschool 
arrangement 










Total 


1,527 


1.06 


1.01 


1.05 


Race/Ethnicity 

Non-Black, Non-Hispanic 
Black, Non-Hispanic 
Hispanic 


1,158 
178 
191 


1.05 
1.11 
1.13 


0.92 
1.49 
1.41 


1.14 
0.77 
0.81 


Characteristics of those in care/ 

preschool/afterschool 

arrangement 










Total 


1,056 


1.05 


0.99 


1.06 


Race/Ethnicity 

Non-Black, Non-Hispanic 
Black, Non-Hispanic 
Hispanic 


831 
118 
107 


1.04 
1.09 
1.14 


0.92 
1.42 
1.38 


1.13 
0.77 
0.83 


Characteristics of children in 
care with an educational 
component 










Total 


567 


1.06 


0.99 


1.07 


Race /Ethnicity 

Non-Black, Non-Hispanic 
Black, Non-Hispanic 
Hispanic 


446 
80 
41 


1.04 
1.12 
1.15 


0.92 
1.44 
1.24 


1.13 

0.77 
0.93 



S is the variance for a similar design where no oversampling is carried out for the minority groups and S^j^ is the variance for the design 
used in the field test. The ratio is the relative design effect due to oversampling clusters divided by the ratio of the sample sizes. 



Source: 1989 National Household Education Survey Field Test 
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the oversampHng rates for either the black or the 
Hispanic exchanges can help accomplish this 
objective. 

Other methods for increasing the size of the sample 
of minorities could also be investigated. For 
example, Blair and Czaja^ discuss screening 
techniques for increasing the sample size for blacks. 
Waksberg's^ comments on this method reveal some 
potential shortcomings for this method, especially for 
Hispanics. The screening method does not appear to 
be efficient compared to the procedure used in the 
Field Test because of the distribution of the 
oversampled groups in telephone clusters for a 
national sample. 



Another factor that should be considered for future 
studies is the age of the information in the Donnelley 
tape. As Mohadier noted, the usefulness of the data 
decays over time. Once a new tape with 1990 Census 
data is prepared, it would be prudent to purchase 
and use the new tape for future surveys. 

The above mentioned oversampling issues are 
planned topics for study using the results from future 
NHES data collection efforts. The need for statistics 
by race and ethnicity can best be satisfied by 
continuing these research efforts. 
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High minority clusters 


Non- 
Black, 
Non- 
Hisoanic 


p\ — • oo cS GO *0 OO rf- 


168% 

168 
168 
168 
168 
168 
168 
168 

168 

168 


Black, 
Non- 
Hispanic 


«o vofSNor- ooo ^ lo 


168% 

168 
168 
168 
168 
168 
168 
168 

168 

168 


Hispanic 


m OS rn m o tj- <nf 
— < r- oo — 
CO — « •— 


168% 

168 
168 
168 
168 
168 
168 
168 

168 

168 


Total 


vo"^^o4 om cs m 
— ' oo — csr- o ov o 
— ^ VO oo cN m ^ ^ 

IS 


168% 

168 
168 
168 
168 
168 
168 
168 

168 

168 


Low minority clusters 


Non- 
Black, 
Non- 
Hispanic 


8,330 

2.301 
3,146 
51 
205 

982 
1.090 

794 

424 


oo OOOOOOOOOOOOOO oo oo 


Black, 
Non- 
Hispanic 


o\ Nor-'cso cnNO »o 
cs — • ^ xj- lo cn <s 
f*^ *--<«—• 


oo OOOOOOOOOOOOOO oo oo 


Hispanic 


o ov-^n-oo oor-' o 
cs — ' ^ VO NO m <s 


3; xj- xJ- xJ- xj- xJ- xJ- xj- xj- 
OO OOOOOOOOOOOOOO oo oo 


Total 


8,979 

2.535 
3,477 
57 
232 

1,084 
1,212 

873 

469 


2; xr xr xr Ti- xf rr xj- 
oo OOOOOOOOOOOOOO oo oo 


otal 


Non- 
Black, 
Non- 
Hispanic 


9,579 

2,592 
3,544 
60 
237 

1,110 
1,234 

892 

477 


cscssorr cs cs <s <s 

ON 0\0\0\0\ 0\0\ Ov Ov 


Black, 
Non- 
Hispanic 


cscNoor* m'"*- o o 
cr\ OS fS — • CO ON o 

00 <S en ^ ^ 


135% 

147 
147 
161 
152 

149 
144 

142 

144 


H 


Hispanic 


en oo oo o o — m vn 
rn rJ- ^ xj- r*^ ^ oo co 
V5 (S CO — • 


128% 

134 
134 
152 
141 

141 
140 

138 

124 




Total 


11 096 

3,122 
4,291 
79 
304 

1,354 
1,515 

1.065 

572 


S SSSo oo 


Characteristics 
of 
Sample 


Expected Sample Yields with 
Equal Probability Sampling 

No. of households enumerated 
No. of households with 14-21 

year olds 
No. of 14-21 year olds 
No. of event dropouts 
No. of status dropouts 
No. of households with 3-5 

year olds 
No. of 3-5 year olds 
No. of 3-5 years olds in care/ 

preschool/fchool 
No. of 3-5 year olds in care 

with education component^ 


Ratio of Observed over 
Expected Sample size (in 
pcrcents) 

No. of households enumerated 
No. of households with 14-21 

year olds 
No. of 14-21 year olds 
No. of event dropouts 
No. of status dropouts 
No. of households with 3-5 

year olds 
No. of 3-5 year olds 
No. of 3-5 year olds in 

care/preschooI/schooI 
No. of 3-5 year olds in care 

with education component^ 
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This appendix contains a summary of the procedures 
used to determine the oversampling rates and the 
definitions of which exchanges were to be 
ovcrsampled for the Field Test of the NHES. The 
development of these recommendations was done by 
Joseph Waksberg. When this design work was 
conducted, the only topic included in the Field Test 
was dropouts. Later investigations confirmed the 
findings also applied to the population of 3- to 5- 
year-olds. 

The method of oversampling recommended for the 
NHES Field Test was to use the Donnelley tape to 
identify telephone prefix areas with high percentages 
of blacks or Hispanics and oversampling these prefix 
areas. The implications of this method of 
oversampling were explored for the following four 
sample designs: 

1. Oversampling areas lhal are over 
10 percent black or Hispanic at the 
rate of 2 to 1; 

2. Oversampling areas with over 
10 percent black or Hispanic at the 
rate of 3 to 1; 

3. Oversampling areas with over 
20 percent black or Hispanic at the 
rate of 2 to 1; and 

4. Oversampling areas with over 
20 percent black or Hispanic at the 
rate of j to 1. 

Note that the oversampling is within prefix areas 
rather than restricted to black and Hispanic 
households. It would, of course, have been possible 
to subsample non-black and non-Hispanic households 
in high minority prefix areas to create a self- 
weighting sample f )r persons who are not black or 
Hispanic. However, because of the extensive 
screening required to locate a single dropout, it was 
clear that this loss in sample size would be quite 
inefficient. (It would yield only a few additional 
black and Hispanic dropouts.) 

All of the samples were constrained to achieve 18,000 
screened households. The usual practice in 
comparing sample designs is to have a constant cost 
as the constraint. This would require a cost function 
that was not developed. Furthermore, the screening 
effort is the dominant factor in the cost of data 
collection so that a constant number of screened 
households is a good approximafion to what one 



would get by fixing the costs. 

Before discussing the features of the various designs, 
it is worth noting that the numbers shown in the 
attached tables should be considered approximations 
rather than firm figures. The main reasons for this 
follow: 

a. The proportion of the black, 
Hispanic, and other population in 
prefix areas stratified by percent 
blacks or Hispanics is based on 
1980 data. (These data will not be 
updated until results of the 1990 
Census are available.) 

b. In estimating the percentages of 
households in the various density 
strata, the national mean household 
size for each race/ethnic group is 
applied in each density stratum. 
Also, the distribution of 14-21 year 
olds and dropouts among density 
strata was assumed to be the same 
as the total population, separately 
for each race/ethnic group; 

c. No allowance was made for 
differential undercoverage of blacks 
or Hispanics; 

d. In estimating variances and design 
effects for characteristics of 
dropouts, the population variances 
of the characteristics were assumed 
to be equal within the density 
strata. 

However, our past experience with approximations of 
this type is that they arc a reasonable guide to what 
will be found in practice. 

Table B-1 is a summary of the results. It shows how 
the four oversampling options compare to a self- 
weighting sample with the same number of screeners. 
The comparisons are shown for estimates of number 
of dropouts and for their characteristics. For each of 
these, table B-1 contains data on design effects, the 
expected number of dropouts in the survey, and the 
variances. The relative design effects reflect the 
increase in variances arising from differential 
sampling rates. The variances show the combined 
effect of the design effects and the changes in sample 
size. The results are not identical for estimates of 
dropouts and their characteristics because the 
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appropriate sample size for estimates of number of 
dropouts is the number of scrccners and the sample 
size for characteristics of dropouts is the number of 
dropouts. 

The implications of table B-1 on the precision of the 
results are in the third and sixth column of table B-1. 
As one might expect, ovcrsampling improves the 
reliability of statistics on blacks and Hispanics, but 
thf re is a loss in precision for other dropouts, and for 
the total. Ovcrsampling at the rale of 3 to 1 does 
not appear to be a useful option. There is a slight 
improvement in variances for blacks, but an increase 
in variances for all other groups. Most of the 
increases arc sizeable. For the 2 to 1 ovcrsampling 
strategy, using areas with over 20 percent minority is 
somewhat better than approach using areas with over 
10 percent minority. Since separate analyses will be 
desired for blacks and Hispanics, (his option is 
recommended. 

It should be recognized that although this appears to 
be the best option, the actual reductions in the 
expected variances will be fairly niodest--18 percent 
for blacks and 8 percent for Hispanics. The sample 
sizes were small; for blacks, the ovcrsampling should 
increase the number of dropouts from 76 to 102 and 
for Hispanics from 89 to 102. 
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